Comparison of Density Estimation Methods for Astronomical Datasets
نویسندگان
چکیده
Context. Galaxies are strongly influenced by their environment. Quantifying the galaxy density is a difficult but critical step in studying the properties of galaxies. Aims. We aim to determine differences in density estimation methods and their applicability in astronomical problems. We study the performance of four density estimation techniques: k-nearest neighbors (kNN), adaptive Gaussian kernel density estimation (DEDICA), a special case of adaptive Epanechnikov kernel density estimation (MBE), and the Delaunay tessellation field estimator (DTFE). Methods. The density estimators are applied to six artificial datasets and on three astronomical datasets, the Millennium Simulation and two samples from the Sloan Digital Sky Survey. We compare the performance of the methods in two ways: first, by measuring the integrated squared error and Kullback–Leibler divergence of each of the methods with the parametric densities of the datasets (in case of the artificial datasets); second, by examining the applicability of the densities to study the properties of galaxies in relation to their environment (for the SDSS datasets). Results. The adaptive kernel based methods, especially MBE, perform better than the other methods in terms of calculating the density properly and have stronger predictive power in astronomical use cases. Conclusions. We recommend the Modified Breiman Estimator as a fast and reliable method to quantify the environment of galaxies.
منابع مشابه
Computational AstroStatistics: Fast Algorithms and Efficient Statistics for Density Estimation in Large Astronomical Datasets
We present initial results on the use of Mixture Models for density estimation in large astronomical databases. We provide herein both the theoretical and experimental background for using a mixture model of Gaussians based on the Expectation Maximization (EM) Algorithm. Applying these analyses to simulated data sets we show that the EM algorithm – using the both the AIC & BIC penalized likelih...
متن کاملComparison of the Gamma kernel and the orthogonal series methods of density estimation
The standard kernel density estimator suffers from a boundary bias issue for probability density function of distributions on the positive real line. The Gamma kernel estimators and orthogonal series estimators are two alternatives which are free of boundary bias. In this paper, a simulation study is conducted to compare small-sample performance of the Gamma kernel estimators and the orthog...
متن کاملMammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملScaling and Fractal Concepts in Saturated Hydraulic Conductivity: Comparison of Some Models
Measurement of soil saturated hydraulic conductivity, Ks, is normally affected by flow patterns such as macro pore; however, most current techniques do not differentiate flow types, causing major problems in describing water and chemical flows within the soil matrix. This study compares eight models for scaling Ks and predicted matrix and macro pore Ks, using a database composed of 50 datasets...
متن کاملData skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation
We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a ...
متن کامل